Modelling-Alignment for Non-random Sequences

نویسندگان

  • David R. Powell
  • Lloyd Allison
  • Trevor I. Dix
چکیده

Populations of biased, non-random sequences may cause standard alignment algorithms to yield false-positive matches and false-negative misses. A standard significance test based on the shuffling of sequences is a partial solution, applicable to populations that can be described by simple models. Masking-out low information content intervals throws information away. We describe a new and general method, modelling-alignment: Population models are incorporated into the alignment process, which can (and should) lead to changes in the rank-order of matches between a query sequence and a collection of sequences, compared to results from standard algorithms. The new method is general and places very few conditions on the nature of the models that can be used with it. We apply modelling-alignment to local alignment, global alignment, optimal alignment, and the relatedness problem. Results: As expected, modelling-alignment and the standard prss program from the FASTA package have similar accuracy on sequence populations that can be described by simple models, e.g. 0-order Markov models. However, modellingalignment has higher accuracy on populations that are mixed or that are described by higher-order models: It gives fewer false positives and false negatives as shown by ROC curves and other results from tests on real and artificial data. Availability: An implementation of the software is available via the Web .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

A local multiple alignment method for detection of non-coding RNA sequences

MOTIVATION Non-coding RNAs (ncRNAs) show a unique evolutionary process in which the substitutions of distant bases are correlated in order to conserve the secondary structure of the ncRNA molecule. Therefore, the multiple alignment method for the detection of ncRNAs should take into account both the primary sequence and the secondary structure. Recently, there has been intense focus on multiple...

متن کامل

Identifying errors in sequence alignment to improve protein comparative modelling

The difference between the number of known protein sequences and the number of protein structures is vast and comparative modelling offers a way to bridge this gap. Misalignment between target and parent is the largest cause of error in comparative modelling and we define SSMAs (Sequence-Structure MisAlignments) as regions where sequence and structural alignments do not agree. We find that most...

متن کامل

Algorithms for Sequence Alignment

Sequence alignment is an important tool for describing relationships between sequences. Many sequence alignment algorithms exist, differing in efficiency, and in their models of the sequences and of the relationship between sequences. The focus of this thesis is on algorithms for the optimal alignment of two or three sequences of biological data, particularly DNA sequences. The algorithms are d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004